Goto

Collaborating Authors

 custom model


Amazon Has New Frontier AI Models--and a Way for Customers to Build Their Own

WIRED

Nova Forge lets Amazon's customers train frontier models for different tasks--a potential breakthrough in making AI actually useful for businesses. Amazon has announced a new family of frontier artificial intelligence models--and a new way for customers to build frontier models of their own. The ecommerce giant announced the second generation of its Nova AI models at re:Invent, a company conference held in Las Vegas. The models are nowhere near as popular as those offered by rivals like OpenAI and Google, but Amazon's plan to make them highly customizable could see them gain traction with its cloud users. Amazon detailed two improved large language models, Nova Lite and Nova Pro; a new realtime voice model called Nova Sonic; and a more experimental model called Nova Omni that performs a simulated kind of reasoning using images, audio, and video as well as text.


Realizing value with AI inference at scale and in production

MIT Technology Review

Training an AI model to predict equipment failures is an engineering achievement. But it's not until prediction meets action--the moment that model successfully flags a malfunctioning machine--that true business transformation occurs. One technical milestone lives in a proof-of-concept deck; the other meaningfully contributes to the bottom line. Craig Partridge, senior director worldwide of Digital Next Advisory at HPE, believes the true value of AI lies in inference". Inference is where AI earns its keep. It's the operational layer that puts all that training to use in real-world workflows.


Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis

Li, Frank, Trivedi, Hari, Khosravi, Bardia, Dapamede, Theo, Chavoshi, Mohammadreza, Dere, Abdulhameed, Isaac, Rohan Satya, Mansuri, Aawez, Newsome, Janice, Purkayastha, Saptarshi, Gichoya, Judy

arXiv.org Artificial Intelligence

Foundation models, trained on vast amounts of data using self-supervised techniques, have emerged as a promising frontier for advancing artificial intelligence (AI) applications in medicine. This study evaluates three different vision-language foundation models (RAD-DINO, CheXagent, and BiomedCLIP) on their ability to capture fine-grained imaging features for radiology tasks. The models were assessed across classification, segmentation, and regression tasks for pneumothorax and cardiomegaly on chest radiographs. Self-supervised RAD-DINO consistently excelled in segmentation tasks, while text-supervised CheXagent demonstrated superior classification performance. BiomedCLIP showed inconsistent performance across tasks. A custom segmentation model that integrates global and local features substantially improved performance for all foundation models, particularly for challenging pneumothorax segmentation. The findings highlight that pre-training methodology significantly influences model performance on specific downstream tasks. For fine-grained segmentation tasks, models trained without text supervision performed better, while text-supervised models offered advantages in classification and interpretability. These insights provide guidance for selecting foundation models based on specific clinical applications in radiology.


Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit

Soni, Aniket Abhishek

arXiv.org Artificial Intelligence

Although speech recognition algorithms have developed quickly in recent years, achieving high transcription accuracy across diverse audio formats and acoustic environments remains a major challenge. This work explores how incorporating custom language models with the open-source Vosk Toolkit can improve speech-to-text accuracy in varied settings. Unlike many conventional systems limited to specific audio types, this approach supports multiple audio formats such as WAV, MP3, FLAC, and OGG by using Python modules for preprocessing and format conversion. A Python-based transcription pipeline was developed to process input audio, perform speech recognition using Vosk's KaldiRecognizer, and export the output to a DOCX file. Results showed that custom models reduced word error rates, especially in domain-specific scenarios involving technical terminology, varied accents, or background noise. This work presents a cost-effective, offline solution for high-accuracy transcription and opens up future opportunities for automation and real-time applications.


BeST -- A Novel Source Selection Metric for Transfer Learning

Soni, Ashutosh, Ju, Peizhong, Eryilmaz, Atilla, Shroff, Ness B.

arXiv.org Machine Learning

One of the most fundamental, and yet relatively less explored, goals in transfer learning is the efficient means of selecting top candidates from a large number of previously trained models (optimized for various "source" tasks) that would perform the best for a new "target" task with a limited amount of data. In this paper, we undertake this goal by developing a novel task-similarity metric (BeST) and an associated method that consistently performs well in identifying the most transferrable source(s) for a given task. In particular, our design employs an innovative quantization-level optimization procedure in the context of classification tasks that yields a measure of similarity between a source model and the given target data. The procedure uses a concept similar to early stopping (usually implemented to train deep neural networks (DNNs) to ensure generalization) to derive a function that approximates the transfer learning mapping without training. The advantage of our metric is that it can be quickly computed to identify the top candidate(s) for a given target task before a computationally intensive transfer operation (typically using DNNs) can be implemented between the selected source and the target task. As such, our metric can provide significant computational savings for transfer learning from a selection of a large number of possible source models. Through extensive experimental evaluations, we establish that our metric performs well over different datasets and varying numbers of data samples. Transfer Learning Pan and Yang (2010) Weiss et al. (2016) is a method to increase the efficacy of learning a target task by transferring the knowledge contained in a different but related source task. It is known that the effectiveness of supervised learning depends on the amount of labeled data.


TinyLLM: A Framework for Training and Deploying Language Models at the Edge Computers

Kandala, Savitha Viswanadh, Medaranga, Pramuka, Varshney, Ambuj

arXiv.org Artificial Intelligence

Language models have gained significant interest due to their general-purpose capabilities, which appear to emerge as models are scaled to increasingly larger parameter sizes. However, these large models impose stringent requirements on computing systems, necessitating significant memory and processing requirements for inference. This makes performing inference on mobile and edge devices challenging, often requiring invocating remotely-hosted models via network calls. Remote inference, in turn, introduces issues like latency, unreliable network connectivity, and privacy concerns. To address these challenges, we explored the possibility of deviating from the trend of increasing model size. Instead, we hypothesize that much smaller models (~30-120M parameters) can outperform their larger counterparts for specific tasks by carefully curating the data used for pre-training and fine-tuning. We investigate this within the context of deploying edge-device models to support sensing applications. We trained several foundational models through a systematic study and found that small models can run locally on edge devices, achieving high token rates and accuracy. Based on these findings, we developed a framework that allows users to train foundational models tailored to their specific applications and deploy them at the edge.


Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis

Yao, Zebin, Feng, Fangxiang, Li, Ruifan, Wang, Xiaojie

arXiv.org Artificial Intelligence

The customization of text-to-image models has seen significant advancements, yet generating multiple personalized concepts remains a challenging task. Current methods struggle with attribute leakage and layout confusion when handling multiple concepts, leading to reduced concept fidelity and semantic consistency. In this work, we introduce a novel training-free framework, Concept Conductor, designed to ensure visual fidelity and correct layout in multi-concept customization. Concept Conductor isolates the sampling processes of multiple custom models to prevent attribute leakage between different concepts and corrects erroneous layouts through self-attention-based spatial guidance. Additionally, we present a concept injection technique that employs shape-aware masks to specify the generation area for each concept. This technique injects the structure and appearance of personalized concepts through feature fusion in the attention layers, ensuring harmony in the final image. Extensive qualitative and quantitative experiments demonstrate that Concept Conductor can consistently generate composite images with accurate layouts while preserving the visual details of each concept. Compared to existing baselines, Concept Conductor shows significant performance improvements. Our method supports the combination of any number of concepts and maintains high fidelity even when dealing with visually similar concepts. The code and models are available at https://github.com/Nihukat/Concept-Conductor.


I2AM: Interpreting Image-to-Image Latent Diffusion Models via Attribution Maps

Park, Junseo, Jang, Hyeryung

arXiv.org Artificial Intelligence

Large-scale diffusion models have made significant advancements in the field of image generation, especially through the use of cross-attention mechanisms that guide image formation based on textual descriptions. While the analysis of text-guided cross-attention in diffusion models has been extensively studied in recent years, its application in image-to-image diffusion models remains underexplored. This paper introduces the Image-to-Image Attribution Maps I2AM method, which aggregates patch-level cross-attention scores to enhance the interpretability of latent diffusion models across time steps, heads, and attention layers. I2AM facilitates detailed image-to-image attribution analysis, enabling observation of how diffusion models prioritize key features over time and head during the image generation process from reference images. Through extensive experiments, we first visualize the attribution maps of both generated and reference images, verifying that critical information from the reference image is effectively incorporated into the generated image, and vice versa. To further assess our understanding, we introduce a new evaluation metric tailored for reference-based image inpainting tasks. This metric, measuring the consistency between the attribution maps of generated and reference images, shows a strong correlation with established performance metrics for inpainting tasks, validating the potential use of I2AM in future research endeavors.


Assessing Cardiomegaly in Dogs Using a Simple CNN Model

Deekonda, Nikhil

arXiv.org Artificial Intelligence

This paper introduces DogHeart, a dataset comprising 1400 training, 200 validation, and 400 test images categorized as small, normal, and large based on VHS score. A custom CNN model is developed, featuring a straightforward architecture with 4 convolutional layers and 4 fully connected layers. Despite the absence of data augmentation, the model achieves a 72\% accuracy in classifying cardiomegaly severity. The study contributes to automated assessment of cardiac conditions in dogs, highlighting the potential for early detection and intervention in veterinary care.


AI development expected to 'explode' in 2024, experts say

FOX News

As wildfire activity reaches record levels, the tech integration company SAIC is developing artificial intelligence technology that can help predict when they'll happen, how to stop them, and how to keep folks safe. Artificial intelligence made a big splash with consumers and regulators alike in 2023, with experts believing the continued development of the technology will reach even greater heights in 2024. "I think that in 2024, AI will move a little closer to what is in the public imagination, but we remain years from AI being autonomous in the way people are imagining it," Christopher Alexander, chief analytics officer of Pioneer Development Group, told Fox News Digital. Alexander's comments come after 2023 saw a noticeable leap in the development and availability of AI tools, with popular language learning model (LLM) platforms such as OpenAI's ChatGPT gaining huge popularity and energizing other tech giants to come along for the ride. Artificial intelligence will gain new capabilities in 2024.